40 research outputs found
Naive Bayes and Exemplar-Based approaches to Word Sense Disambiguation Revisited
This paper describes an experimental comparison between two standard
supervised learning methods, namely Naive Bayes and Exemplar-based
classification, on the Word Sense Disambiguation (WSD) problem. The aim of the
work is twofold. Firstly, it attempts to contribute to clarify some confusing
information about the comparison between both methods appearing in the related
literature. In doing so, several directions have been explored, including:
testing several modifications of the basic learning algorithms and varying the
feature space. Secondly, an improvement of both algorithms is proposed, in
order to deal with large attribute sets. This modification, which basically
consists in using only the positive information appearing in the examples,
allows to improve greatly the efficiency of the methods, with no loss in
accuracy. The experiments have been performed on the largest sense-tagged
corpus available containing the most frequent and ambiguous English words.
Results show that the Exemplar-based approach to WSD is generally superior to
the Bayesian approach, especially when a specific metric for dealing with
symbolic attributes is used.Comment: 5 page
Boosting Applied to Word Sense Disambiguation
In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied
to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of
15 selected polysemous words show that the boosting approach surpasses Naive
Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy
on supervised WSD. In order to make boosting practical for a real learning
domain of thousands of words, several ways of accelerating the algorithm by
reducing the feature space are studied. The best variant, which we call
LazyBoosting, is tested on the largest sense-tagged corpus available containing
192,800 examples of the 191 most frequent and ambiguous English words. Again,
boosting compares favourably to the other benchmark algorithms.Comment: 12 page
Fully Automated Fact Checking Using External Sources
Given the constantly growing proliferation of false claims online in recent
years, there has been also a growing research interest in automatically
distinguishing false rumors from factually true claims. Here, we propose a
general-purpose framework for fully-automatic fact checking using external
sources, tapping the potential of the entire Web as a knowledge source to
confirm or reject a claim. Our framework uses a deep neural network with LSTM
text encoding to combine semantic kernels with task-specific embeddings that
encode a claim together with pieces of potentially-relevant text fragments from
the Web, taking the source reliability into account. The evaluation results
show good performance on two different tasks and datasets: (i) rumor detection
and (ii) fact checking of the answers to a question in community question
answering forums.Comment: RANLP-201
Automatic Stance Detection Using End-to-End Memory Networks
We present a novel end-to-end memory network for stance detection, which
jointly (i) predicts whether a document agrees, disagrees, discusses or is
unrelated with respect to a given target claim, and also (ii) extracts snippets
of evidence for that prediction. The network operates at the paragraph level
and integrates convolutional and recurrent neural networks, as well as a
similarity matrix as part of the overall architecture. The experimental
evaluation on the Fake News Challenge dataset shows state-of-the-art
performance.Comment: NAACL-2018; Stance detection; Fact-Checking; Veracity; Memory
networks; Neural Networks; Distributed Representation
Fact Checking in Community Forums
Community Question Answering (cQA) forums are very popular nowadays, as they
represent effective means for communities around particular topics to share
information. Unfortunately, this information is not always factual. Thus, here
we explore a new dimension in the context of cQA, which has been ignored so
far: checking the veracity of answers to particular questions in cQA forums. As
this is a new problem, we create a specialized dataset for it. We further
propose a novel multi-faceted model, which captures information from the answer
content (what is said and how), from the author profile (who says it), from the
rest of the community forum (where it is said), and from external authoritative
sources of information (external support). Evaluation results show a MAP value
of 86.54, which is 21 points absolute above the baseline.Comment: AAAI-2018; Fact-Checking; Veracity; Community-Question Answering;
Neural Networks; Distributed Representation